20 research outputs found

    Interlingual Lexical Organisation for Multilingual Lexical Databases in NADIA

    Full text link
    We propose a lexical organisation for multilingual lexical databases (MLDB). This organisation is based on acceptions (word-senses). We detail this lexical organisation and show a mock-up built to experiment with it. We also present our current work in defining and prototyping a specialised system for the management of acception-based MLDB. Keywords: multilingual lexical database, acception, linguistic structure.Comment: 5 pages, Macintosh Postscript, published in COLING-94, pp. 278-28

    An Interlingual Lexical Organisation Based on Acceptions. From the PARAX mock-up to the NADIA system.

    Get PDF
    Many projects are conducted to develop multilingual lexical databases. Some of these projects use an interlingual approach (KBMT-89, EDR, ...), where others choose a bilingual approach (Multilex, ...)

    METEOR For Multiple Target Languages Using DBnary

    No full text
    International audienceThis paper proposes an extension of METEOR, a well-known MT evaluation metric, for multiple target languages using an in-house lexical resource called DBnary (an extraction from Wiktionary provided to the community as a Multilingual Lexical Linked Open Data). Today, the use of the synonymy module of METEOR is only exploited when English is the target language (use of WordNet). A synonymy module using DBnary would allow its use for the 21 languages (covered up to now) as target languages. The code of this new instance of METEOR, adapted to several target languages, is provided to the community via a github repository. We also show that our DBnary augmented METEOR increases the correlation with human judgements on the WMT 2013 and 2014 metrics dataset for English-to-(French, Russian, German, Spanish) language pairs

    Cross-Lingual Link Discovery for Under-Resourced Languages

    Get PDF
    CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources

    Sublim : un systeme universel de bases lexicales multilingues et Nadia : sa specialisation aux bases lexicales interlingues par acceptions

    No full text
    The aim of this thesis is the definition and development of a multilingual lexical database manager independent from the applications and linguistic theories. It begins with the study of some dictionaries (printed or electronic) and some tools for the management of lexical databases, like, notably, the MULTILEX ESPRIT project, considered as the most advanced effort, although it presents some limitations (transfer-based lexical architecture, linguistic structures coded as feature structures, etc.). The second part of this thesis is dedicated to the definition of SUBLIM, a multilingual lexical database management system which allows the specification of the lexical architecture (organization of the dictionaries), and of the linguistic architecture (organization of the linguistic information inside the dictionaries). This system does not constrain the kind of dictionaries or linguistic structures to be used. The third part of the thesis presents a specialization of this generic system for the management of multilingual lexical databases based on interlingual acceptions: NADIA. This approach is a generalization of some interlingual methods, as the one used in the ULTRA translation system, and an alternative to the knowledge based approach.Le but de cette these est de definir et developper un systeme de bases lexicales multilingues independant des applications et des theories linguistiques. Elle debute donc par une etude des dictionnaires (imprimes ou electroniques) et des outils de gestion de bases lexicales, avec, notamment, le projet ESPRIT MULTILEX, considere comme l'effort le plus avance, mais qui presente quelques faiblesses (architecture lexicale par transfert, structures linguistiques codees sous forme de structures de traits types.). La deuxieme partie de cette these est consacree a la definition de SUBLIM, un systeme de gestion de bases lexicales multilingues qui permet de specifier l'architecture lexicale (organisation des dictionnaires) et l'architecture linguistique (organisation des informations linguistiques des unites des dictionnaires), sans imposer de contraintes ni sur les types de dictionnaires choisis, ni sur les structures linguistiques utilisees. La troisieme et derniere partie de cette these presente une specialisation de ce systeme generique en un systeme dedie a la gestion de bases lexicales multilingues fondees sur les acceptions interlingues : NADIA. Cette approche generalise certaines methodes interlingues comme celle du projet de traduction multilingue ULTRA, et permet la definition de bases lexicales multilingues ne se basant pas sur une approche par connaissances

    DBnary2Vec: Preliminary Study on Lexical Embeddings for Downstream NLP Tasks

    No full text
    In this preliminary study, we experiment with the use of DBnary, a big lexical knowledge graph, to create word embeddings that could be used in NLP downstream tasks. Our gamble is that word embeddings created from lexical data (instead of language corpora) may exhibit less biases while still being usable as the first layer of deep learning approaches to NLP tasks. We tried very basic method of embedding creation from lexical graph and evaluate (1) the intrinsic performance of the created embeddings on word similarity and word analogy test sets and their extrinsic quality in POS tagging and NER downstream tasks, along with (2) the biases they may exhibit. Such embeddings show promising performances outperforming word2vec on few specific tasks, while still not on par on most others, but we confirm that they exhibit less bias overall

    UNL-French deconversion as transfer generation from an interlingua with possible quality enhancement through offline human interaction

    No full text
    We present the architecture of the UNLFrench deconverter, which "generates" from the UNL interlingua by first "localizing" the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in GETA's Ariane-G5 environment, supplemented by some UNL-specific tools. Online interaction can be used during deconversion to enhance output quality and is now used for development purposes. We show how interaction could be delayed and embedded in the postedition phase, which would then interact not directly with the output text, but indirectly with several components of the deconverter. Interacting online or offline can improve the quality not only of the utterance at hand, but also of the utterances processed later, as various preferences may be automatically changed to let the deconverter "learn"

    On UNL as the future html of the linguistic content and the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter

    No full text
    AuRC 3 years of specifying the UNL (Universa Networking Language) anguage and prototyping deconverters from more than 12 languages and enconverters for about 4, the UNL project has opened to the community by publishing the specifications (v2.0) of the UNL language, intended to encode the meaning of NL utterances as semantic hypergraphs and to be used as a "pivot" representation in multilingual information and communication systems
    corecore